-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -654,6 +654,8 @@ Indexing | |||
- Fixed ``DataFrame[np.nan]`` when columns are non-unique (:issue:`21428`) | |||
- Bug when indexing :class:`DatetimeIndex` with nanosecond resolution dates and timezones (:issue:`11679`) | |||
- Bug where indexing with a Numpy array containing negative values would mutate the indexer (:issue:`21867`) | |||
- Bug Dataframe.ne fails if columns containing column name 'dtype' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite as follows:
:meth:`DataFrame.ne` fails if columns contain column name "dtype" (:issue:`22383`)
- The "meth" syntax enables nicer rendering in our online docs.
- We always reference an issue if possible with our
whatsnew
entries. The syntax is to enable a nicer rendering in our online docs.
pandas/tests/test_expressions.py
Outdated
@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self): | |||
r = f(df, True) | |||
e = fe(df, True) | |||
tm.assert_frame_equal(r, e) | |||
|
|||
def test_bool_ops_column_name_dtype(self): | |||
# GH 22383 - .ne fails if columns containing column name 'dtype' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra space after the dash.
@gfyoung thank you your review :) i modified the code you reviewed. and reason i rename 'dtype' to 'temporary_dtype' is when dataframe has 'dtype' column and call ne() function, dataframe's dtype member variable don't use and occur error as issue 22383. thank you! |
Indeed, I saw! It seems like you have test failures (based on your previous commit). Should look into that. |
@gfyoung thank you your help. |
@gfyoung ah, sorry. i understood now. i will check ci/circleci . thank you :) |
@gfyoung just left continuous intergration test thanks to you. but i don't know how to run that test. |
b5b1660
to
c0d88b8
Compare
Codecov Report
@@ Coverage Diff @@
## master #22416 +/- ##
=======================================
Coverage 31.89% 31.89%
=======================================
Files 166 166
Lines 52421 52421
=======================================
Hits 16722 16722
Misses 35699 35699
Continue to review full report at Codecov.
|
c0d88b8
to
621fb1d
Compare
FYI, http://pandas-docs.github.io/pandas-docs-travis/contributing.html may be helpful. |
pandas/tests/test_expressions.py
Outdated
|
||
def test_bool_ops_column_name_dtype(self): | ||
# GH 22383 - .ne fails if columns containing column name 'dtype' | ||
df_has_error = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this df
pandas/tests/test_expressions.py
Outdated
[0, 1, 5, 'bb'], ['cc', 4, 4, 4]], | ||
columns=['a', 'b', 'c', 'd']) | ||
result = df_has_error.loc[:, ['a', 'dtype']].ne(df_has_error.loc[:, | ||
['a', 'dtype']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an expected= line
instead of calling the function on a good df, just construct the resulting dataframe directly
pandas/tests/test_expressions.py
Outdated
@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self): | |||
r = f(df, True) | |||
e = fe(df, True) | |||
tm.assert_frame_equal(r, e) | |||
|
|||
def test_bool_ops_column_name_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize on both eq and ne here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback thank you your advice~! i can not understand how to parameterize eq and ne and why do parameterize... sorry i am not good at undertstanding english... thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baidoosik : This is what @jreback is talking about:
https://docs.pytest.org/en/latest/parametrize.html
Perhaps that will help clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("index,has_tz", [
(pd.date_range('2015-01-01 10:00', freq='D', periods=3,
tz='US/Eastern'), True), # datetimetz
(pd.timedelta_range('1 days', freq='D', periods=3), False), # td
(pd.period_range('2015-01-01', freq='D', periods=3), False) # period
])
you recommend like above code style, is it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That's right.
Hello @baidoosik! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 27, 2018 at 17:46 Hours UTC |
pandas/tests/test_expressions.py
Outdated
@@ -442,3 +441,16 @@ def test_bool_ops_warn_on_arithmetic(self): | |||
r = f(df, True) | |||
e = fe(df, True) | |||
tm.assert_frame_equal(r, e) | |||
|
|||
def test_has_bool_ops_column_name_dtype(self, eq, ne): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...that's not quite what @jreback was looking for:
https://docs.pytest.org/en/latest/parametrize.html
The idea is that you have only one variable in your signature, and for each possible value of that variable (ne
and eq
in this case), you run the test.
@baidoosik can you merge master and update |
@jreback okay thank you ! i try it |
@jreback now my development environment is not properly working. so i will fix my environment and f test code to use parameterize style. and i will push. thank you! |
@baidoosik can you merge master |
@jreback today i will update ! really sorry ! |
@baidoosik pls merge master and update |
36ab522
to
f0e1fbb
Compare
@jreback today i merge master and apply to parameterize in test code..! thank you. if CI is failed, i will fix it ! |
@@ -160,6 +160,8 @@ def _where_numexpr(cond, a, b): | |||
|
|||
def _has_bool_dtype(x): | |||
try: | |||
if not isinstance(x.dtype, np.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? what is this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback i change the code if x is dataframe type, dont access x.dtype. i will wait your review. thank you!
pandas/tests/test_expressions.py
Outdated
]) | ||
def test_bool_ops_column_name_dtype(self, test_input, expected): | ||
# GH 22383 - .ne fails if columns containing column name 'dtype' | ||
assert_frame_equal(test_input.loc[:, ['a', 'dtype']].ne(test_input.loc[:, ['a', 'dtype']]), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
result =
expected =
assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
if i follow your guide, i think code to be long. so, could i declare dataframe variable on the top?
example
_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']
@pytest.mark.parametrize("test_input,expected", [
(_issued_frame.loc[:, ['a', 'dtype']].ne(_issued_frame.loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype']))
])
def test_bool_ops_column_name_dtype(self, test_input, expected):
# GH 22383 - .ne fails if columns containing column name 'dtype'
assert_frame_equal(test_input, expected)
pandas/tests/test_expressions.py
Outdated
@@ -22,6 +22,13 @@ | |||
|
|||
_frame = DataFrame(randn(10000, 4), columns=list('ABCD'), dtype='float64') | |||
_frame2 = DataFrame(randn(100, 4), columns=list('ABCD'), dtype='float64') | |||
_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are only used once, so just construct them inside the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback if i construct them on @pytest.mark.parametrize's argument, code is so long. is it okay?
example)
@pytest.mark.parametrize("test_input,expected", [
(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']].
ne(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),
so, could you give me some tip to good code for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback i remove them. and construct in the test. thank you
columns=['a', 'b', 'c', 'dtype']) | ||
.loc[:, ['a', 'dtype']]), | ||
DataFrame([[False, False], [False, False], | ||
[False, False]], columns=['a', 'dtype'])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update as I indicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback i modify code! is it okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. couple of minor things. ping on green.
pandas/tests/test_expressions.py
Outdated
]) | ||
def test_bool_ops_column_name_dtype(self, test_input, expected): | ||
# GH 22383 - .ne fails if columns containing column name 'dtype' | ||
result = test_input.loc[:, ['a', 'dtype']].\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use parens rather than \
around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback i changed new line position
return x.dtype == bool | ||
except AttributeError: | ||
try: | ||
if isinstance(x, DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use
from pandas.core.dtypes.generic import ABCDataFrame
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback ok! i fixed.
thanks! |
@jreback thanks you so much your kind review! thanks to your kindness, i contributed opensource at first! thank you! |
hello, this is my first pr in open source.
so , i think some things have problem.
if this pr has problem , tell me. i will fix again.
this issue occur Dataframe has column defined 'dtype' when execute .ne function.
i asked some advice which way is more better.
i) when make dataframe instance column name is dtype, make exception.
ii) make exception in ne function.
i receive answer try first. so i try to modifiy _has_bool_dtype function.
git diff upstream/master -u -- "*.py" | flake8 --diff